Machine learning algorithms predicting bladder cancer associated with diabetes and hypertension: NHANES 2009 to 2018

Bladder cancer is 1 of the 10 most common cancers in the world. However, the relationship between diabetes, hypertension and bladder cancer are still controversial, limited study used machine learning models to predict the development of bladder cancer. This study aimed to explore the association between diabetes, hypertension and bladder cancer, and build predictive models of bladder cancer. A total of 1789 patients from the National Health and Nutrition Examination Survey were enrolled in this study. We examined the association between diabetes, hypertension and bladder cancer using multivariate logistic regression model, after adjusting for confounding factors. Four machine learning models, including extreme gradient boosting (XGBoost), Artificial Neural Networks, Random Forest and Support Vector Machine were compared to predict for bladder cancer. Model performance was assessed by examining the area under the subject operating characteristic curve, accuracy, recall, specificity, precision, and F1 score. The mean age of bladder cancer group was older than that of the non-bladder cancer (74.4 years vs 65.6 years, P < .001), and men were more likely to have bladder cancer. Diabetes was associated with increased risk of bladder cancer (odds ratio = 1.24, 95%confidence interval [95%CI]: 1.17–3.02). The XGBoost model was the best algorithm for predicting bladder cancer; an accuracy and kappa value was 0.978 with 95%CI:0.976 to 0.986 and 0.01 with 95%CI:0.01 to 0.52, respectively. The sensitivity was 0.90 (95%CI:0.74–0.97) and the area under the curve was 0.78. These results suggested that diabetes is associated with risk of bladder cancer, and XGBoost model was the best algorithm to predict bladder cancer.


Introduction
Bladder cancer is 1 of the 10 most common cancers in the world, it spans a spectrum of illnesses, ranging from chronically recurrent noninvasive tumors handled noninvasively to severe or advanced-stage illness requiring multimodal and intrusive treatment. [1,2]The incidence and mortality of bladder cancer are both very high, approximately 573,278 men and women are diagnosed with bladder cancer worldwide each year, more than 1.6 million living with the disease and 213,000 deaths, placing a significant burden on society. [2]For men, the lifetime risk of bladder cancer is about 1.1%, whereas for women, it is about 0.27%. [3]In the US, there is a documented higher occurrence of bladder cancer, accounting for more than 80,000 new cases and 17,000 deaths. [4]he highest age-adjusted incidence rates of bladder cancer are found in non-Hispanic White people (23.09 [95%confidence interval (95%CI), 22.97-23.21]per 100,000 person-years), and African Americans experience worse disease-specific outcomes and higher rates of unfavorable histology. [5,6]adder cancer has numerous risk factors, such as cigarette smoking, advanced age, gender, race, exposure to carcinogens and so on.One study found that the average age of diagnosis for bladder cancer was between 70 and 84 years, with advanced age being the biggest risk factor. [7]In addition, numerous epidemiological studies have revealed a link between an elevated risk of bladder cancer and chronic disease conditions.With population aging, urbanization, and accompanying lifestyle changes, diabetic is becoming more common everywhere.Meanwhile, hypertension is becoming a bigger public health issue, and major cause of disability and the leading risk factor for death around the world.Based on International Diabetes Federation Diabetes Atlas, 9th edition, the prevalence of diabetes worldwide in 2019 is 9.3% (463 million people), and it is expected to increase to 10.2% (578 million) by 2030 and 10.9% (700 million) by 2045. [8]A systematic review and meta-analysis of cohort studies indicated statistically significant relationships between diabetes (relative risk = 1.23; 95%CI: 1.16-1.31)and hypertension (relative risk = 1.07; 95%CI: 1.01-1.13)with bladder cancer. [9]ansal et al [10] conducted a meta-analysis including 45 studies showed that patients with diabetes have a statistically significant (14% lower) risk of acquiring prostate cancer.Thus, the relationship between diabetes, hypertension and bladder cancer are still controversial.
Machine learning (ML), a new type of artificial intelligence algorithm, can discover data patterns and correlations in large data sets, and then use this information to determine the optimal course of action and prediction. [11]14] We discovered that limited study has used clinical data with ML to predict the development of bladder cancer.Therefore, the aim of study was to explore the association between diabetes, hypertension and bladder cancer by using multivariate logistic regression.Additionally, we used 4 ML algorithms to predict the development of bladder cancer.

Study population
The dataset for this study came from the National Health and Nutrition Examination Survey (NHANES) survey conducted by the Centers for Disease Control and Prevention in the United States.The NHANES program assesses the overall health and nutrition status of adults and children, collecting information on demographic, socioeconomic, dietary, and health-related issues.In addition, laboratory tests, medical, dental and physiological assessments are carried out by extensively trained medical professionals.The NHANES used a multistage probability sample that was designed to be representative of noninstitutionalized adults in the US.In the medical condition section of the questionnaire, officials provided self-reported personal interview data.First, we used serial cross-sectional waves of NHANES from 2009 to 2018, with a total of 55,018 participants.Next, individuals were asked questions: "What kind of cancer?".Patients with bladder cancer were selected as included subjects.After including our selected covariates and removing missing values, a total of 1789 patients were included in final analysis.Detailed statistics can be accessible at https://www.cdc.gov/nchs/nhanes/.

Diagnostic criteria
The diagnostic criteria for hypertension were: a resting systolic blood pressure ≥ 140mmHg and/or a diastolic blood pressure ≥ 90 mm Hg.Diabetes was defined a documented history in the medical record.Body mass index (BMI) was calculated as weight in kilograms divided by height in meters squared (kg/m 2 ).Three BMI classifications were used < 25, 25 to 30, and ≥ 30 kg/m 2 , reflecting underweight or normal weight, overweight, and obesity, respectively.Sleep quality is measured based on the length of sleep at night, and <7 hours is considered poor.

Covariates
Other covariate data were also collected.Age was calculated from the interview date to the birth date.Gender was selfreported as male or female.Race was categorized to 5 group: Mexican American, Other Hispanic, Non-Hispanic White, Non-Hispanic Black and Other Race.Education was grouped according to the following categories: less than high school (<grade 9), 9 to 11th grade, high school, some college and college graduate or above.Marital Status was categorized as married, widowed, divorced, separated, never married, living with partner, and refused.PIR refers to family poverty income ratio, where a PIR value <1 indicates an income below the poverty level and a PIR value larger than 1 indicates an income over the poverty threshold.Both smoking and alcohol drinking was separated into 2 categories: yes or no.Blood pressure was measured using a mercury sphygmomanometer, 3 consecutively reading of systolic blood pressure and diastolic blood pressure were taken 5min intervals.The mean of the 3 readings was calculated in the analysis.

Statistical analysis
Continuous variables were displayed as mean (standard deviation).Differences in continuous variables between groups were tested with Student t test.Categorical variables were provided as percentages, and differences in categoric variables between groups were assessed by χ 2 tests.Multivariate logistic regression model was performed to estimate the association between hypertension, diabetes and bladder cancer.Covariates were adjusted for the following 4 models: Model 1 = unadjusted; Model 2 = hypertension/diabetes; Model 3 = Model 2 + sex, age (continuous), education, race and BMI; Model 4 = Model 3 + smoking, alcohol drinking, and sleep quality.In order to better predict bladder cancer, we first carried out random forest interpolation on all data to ensure sufficient sample size and variables.After interpolation, the sample size for ML reached 36,149, with sufficient data to achieve the best prediction effect.We selected 4 highly recognized ML algorithms: XGBoost, Artificial Neural Networks, Random Forest, and Support Vector Machine.In order to achieve the best predictive effect of ML, we applied the random forest algorithm to interpolate missing values.The comparison between the 4 algorithms was based on the accuracy of the predictions and the Kappa value.As shown in Supplementary Figure S1, http://links.lww.com/MD/L35,we drew the density plot after interpolating the severely missing drinking data, and found that the inserted value was consistent with the true value, and the interpolation effect was good.The ML model performance was assessed using the accuracy, precision, sensitivity, specificity, and area under the receiving operating characteristic curve (AUC).All the analyses were conducted in R software 4.1.2(The R Foundation for Statistical Computing, USA).Twosided P < .05 was considered statistically significant.

Ethical approval
The institutional review board of the National Center for Health Statistics, a division of the Centers for Disease Control and Prevention, gave its approval to the NHANES protocols.Before beginning the study, written informed consent was obtained from each participant.

The clinical characteristics
The basic characteristics of participants grouped according to whether they had bladder cancer are presented in Table 1.The mean age of bladder cancer group was older than that of the non-bladder cancer (74.4 years vs 65.6 years, P = .0009).Gender was significantly associated with bladder cancer; men were more likely to have bladder cancer (P = .011).Other variables were not statistically different between the 2 groups, including PIR, BMI, race, education, marital status, alcohol, smoking, hypertension, diabetes and sleep quality (P > .05).

Association between diabetes, hypertension and bladder cancer
Using Logistic regression models adjusted with different confounding factors, we explored the associations between diabetes, hypertension and bladder cancer (Table 2).In fully adjusted model, diabetes was significantly associated with bladder cancer (odds ratio = 1.24, 95%CI: 1.17-3.02).There was no statistical significance between hypertension and bladder cancer.Hypertension may not be a risk factor for bladder cancer.

ML algorithms to predict bladder cancer
Figure 1 compares the fit plots after placing all covariables into 4 ML algorithms.We concluded that the XGBoost model fit the best (accuracy = 0.978 with 95%CI:0.976-0.986;kappa value = 0.01 with 95%CI: 0.01-0.52).We also produced receiver operating characteristic curve to evaluate the prediction effect of these 4 ML algorithms (Fig. 2).The AUC values were calculated for 4 models (XGBoost AUC = 0.78; Random Forest AUC = 0.78; Artificial Neural Networks AUC = 0.67; Support Vector Machine AUC = 0.66, respectively), which demonstrated that XGBoost predicted well.In order to further check the evaluation results, we also show the evaluation parameters of 4 ML algorithms in Table 3. XGBoost algorithm has the best prediction effect on bladder cancer based on clinical data.

Discussion
In this study, we analyzed 1789 patients from NHANES and found that advanced age and gender were important risk factors for bladder cancer.Results indicated that men were more likely to have bladder cancer, which was consistent with previous study. [15]Men are 3 to 4 times more likely to be diagnosed with bladder cancer than women, a fact that is generally related to exposures and lifestyle. [16]At the epidemiologic level, large cohort studies have shed light on the potential contribution of sex steroids to bladder cancer, for instance, postmenopausal women have a larger chance of developing bladder cancer than do premenopausal women. [17]At the molecular level, the underlying mechanism responsible for the differences in bladder cancer incidence may be sex differences in carcinogenic cell metabolism.In the Nurses' Health Study (NHS) and NHS II, younger age at menopause (≤45 years) was associated with an increased risk of bladder cancer (incidence risk ratios: 1.41, 95%CI: 1.11-1.81),relative to those with menopause beginning at age 50 + years. [17]Numerous clinical studies have found that the median age of diagnosis for bladder cancer is greater than for other major tumors. [18]In addition to changes in the gut and urinary tract microbiota and elevated indicators of chronic inflammation, aging is related with an increased incidence, morbidity, and death of bladder cancer.Age-related changes in a variety of microbiomes may be the cause of the rising incidence and mortality of bladder cancer in the elderly.These changes may generate systemic metabolic modifications that impact immunological dysregulation. [19]erein, we also explored the association between diabetes, hypertension and bladder cancer.Diabetes has been suggested as an important risk factor for bladder cancer; however, conflicting results have emerged.Choi et al [20] found that diabetes was associated with an increased risk of bladder cancer (hazard ratio = 1.23, 95%CI:1.17-1.28),which was consistent with our study.A nationally representative study from the Korean National Health Insurance System also illustrated there was a strongly positive association between diabetes and bladder cancer. [21]In a case-control study, diabetes was linked to an elevated risk of bladder cancer (adjusted odds ratio: 2.2, 95%CI, 1.3-3.8). [22]However, a recent meta-analysis showed that impaired fasting glucose was not associated with the risk of bladder cancer. [23]The relationship between diabetes and bladder cancer has not been fully established, and even an inverse association has been observed.The biochemical connections between diabetes and bladder cancer are as follows: aberrant insulin/insulin-like growth factor axis activity, inflammatory cytokines, hyperglycemia, faulty sex hormone biosynthesis, and elevated oxidative stress that causes DNA damage. [24,25]Urinary tract infections, which are common in diabetes patients, are another factor in the development of bladder cancer, and bacterial cell components can cause cellular growth and inflammation, potentially accelerating the development of cancer. [26]Therefore, early prevention and treatment in patients with diabetes could effectively reduce the occurrence of bladder cancer.
Findings from this study found that there was no significant association between hypertension and bladder cancer, which is inconsistent with the conclusions of some studies.Epidemiologic evidence involving 79,236 propensity score-matched individuals found a positive association between hypertension and subsequent bladder cancer development. [27]Untreated hypertension was associated with a reduced risk of bladder cancer. [28,29]he following justifies the link between high blood pressure and bladder cancer: hypertension is a metabolic syndrome Table 1 The basic characteristics of participants grouped according to whether they had bladder cancer.Xu and Huang • Medicine (2024) 103:4 Medicine component that has been linked to the development of cancer in the future.It is evident that there is conflicting evidence about the link between high blood pressure and bladder cancer.Over 116 million adults in the US and over 1 billion people globally suffer from hypertension, which is a major cause of CVD morbidity and mortality.The weighted prevalence of hypertension was 46.7%, with 80.1% of cases being uncontrolled.[30] Given that high blood pressure and bladder cancer are both serious health problems and the question of the link between the 2 remains unresolved, there is an urgent need to explore whether people with high blood pressure are at higher risk of developing bladder cancer so that appropriate prevention and treatment measures can be taken early.ML is increasingly being used in clinical oncology to diagnose cancer, predict patient outcomes, and inform treatment plans.ML has many benefits, such as the ability to recognize trends and patterns quickly, the lack of human interaction, ongoing progress, and a broad range of applications, which can significantly increase the accuracy of prognosis and diagnosis when applied to clinical data.[31] In this study, by using multivariate logistic regression to determine the risk factors, then we adopted 4 ML models to predict bladder cancer.It could be seen from the comparison plot that XGBoost outperformed the other algorithms, the AUC was 0.78.Kouznetsova et al [32] used 2 modeling techniques: multilayer perceptron and stochastic gradient descent with logistic regression loss function to locate bladder cancer patients, with an accuracy of 82.54%.XGBoost we performed in this study provided a better sensitivity and specificity.

Characteristics
However, it has some limitations.Firstly, the cross-sectional study design from NHANES precluded establishing causality between diabetes, hypertension and bladder cancer.Second, in an observational study, residual confounders could not be completely ruled out, for example, diet, stress, physical   activity and genetic factors are known confounders for bladder cancer, these variables were not accounted for in the analysis.Further studies are needed to clarify the mechanism in the relationship between diabetes, hypertension and bladder cancer.

Conclusions
Bladder cancer is the most common malignancy of the urinary tract.Our analysis further confirmed the significant effects of diabetes on bladder cancer.The clinical significance of ML in identifying and predicting disease frequency and progression as well as other risk factors for blader cancer can be defined by additional study.

Model 1 :
Unadjusted; Model 2 = Adjusted for hypertension (when studying the relationship between diabetes and bladder cancer); adjusted for diabetes (when studying the relationship between hypertension and bladder cancer); Model 3: Adjusted for hypertension/diabetes, sex, age (continuous), education, race and BMI; Model 4 = Adjusted for hypertension/diabetes, sex, age (continuous), education, race and BMI, smoking, alcohol drinking, and sleep quality.95%CI = 95%confidence interval, BMI = body mass index.

Figure 1 .
Figure 1.Comparison of accuracy and kappa value among 4 machine learning algorithms.